Interactive Visualization for Computational Linguistics
نویسندگان
چکیده
Interactive information visualization is an emerging and powerful research technique that can be used to understand models of language and their abstract representations. Much of what computational linguists fall back upon to improve NLP applications and to model language “understanding” is structure that has, at best, only an indirect attestation in observable data. An important part of our research progress thus depends on our ability to fully investigate, explain, and explore these structures, both empirically and relative to accepted linguistic theory. The sheer complexity of these abstract structures, and the observable patterns on which they are based, usually limits their accessibility — often even to the researchers creating or attempting to learn them. To aid in this understanding, visual ‘externalizations’ are used for presentation and explanation — traditional statistical graphs and custom-designed illustrations fill the pages of ACL papers. These visualizations provide post hoc insight into the representations and algorithms designed by researchers, but visualization can also assist in the process of research itself. There are special statistical methods, falling under the rubric of “exploratory data analysis,” and visualization techniques just for this purpose, in fact, but these are not widely used or even known in CL. These techniques offer the potential for revealing structure and detail in data, before anyone else has noticed them. When observing natural language engineers at work, we also notice that, even without a formal visualization background, they often create sketches to aid in their understanding and communication of complex structures. These are ad hoc visualizations, but they, too, can be extended by taking advantage of current information visualization research. This tutorial will enable members of the ACL community to leverage information visualization theory into exploratory data analysis, algorithm design, and data presentation techniques for their own research. We draw on fundamental studies in cognitive psychology to introduce ‘visual variables’ — visual dimensions on which data can be encoded. We also discuss the use of interaction and animation to enhance the usability and usefulness of visualizations. Topics covered in this tutorial include a review of information visualization techniques that are applicable to CL, pointers to existing visualization tools and programming toolkits, and new directions in visualizing CL data and results. We also discuss the challenges of evaluating visualizations, noting differences from the evaluation methods traditionally used in CL, and discuss some heuristic approaches and techniques used for measuring insight. Information visualizations in CL research can also be measured by the impact they have on algorithm and data structure design. Information visualization is also filled with opportunities to make more creative visualizations that benefit from the CL community’s deeper collective understanding of natural language. Given that most visualizations of language are created by researchers with little or no linguistic expertise, we’ll cover some open and very ripe possibilities for improving the state of the art in text-based visualizations.
منابع مشابه
Diachronic-information visualization in historical dictionaries
The field of computational linguistics has been dealing with the modeling of natural language from a computational perspective since the 1950s. However the usage of advanced and interactive visualization techniques is very limited. This is especially the case of diachronic linguistics, which is devoted to the study of language change. This work is part of a project that aims to provide novel hi...
متن کاملInteractive Text Mining Suite: Data Visualization for Literary Studies
In recent years, there has been growing interest in visualization methods for literary text analysis. While text mining and visualization tools have evolved into mainstream research methods in many fields (e.g. social sciences, machine learning), their application to literary studies still remains infrequent. In addition to technological challenges, the use of these tools requires a methodologi...
متن کاملRepresentation and Processing of Complex DNA Spatial Architecture and its Annotated Genomic Content
This paper presents a new general approach for the spatial representation and visualization of DNA molecule and its annotated information. This approach is based on a biological 3D model that predicts the complex spatial trajectory of huge naked DNA. With such modeling, a global vision of the sequence is possible, which is different and complementary to other representations as textual, linguis...
متن کاملA Love-Hate Relationship for Big Data and Linguistics: Present Issues and Future Possibilities
In this paper, we present an overview of some issues related to the use of Big Data in the area of Linguistics that have been debated in workshops and conferences in the last two years. We also consider some requirements that “big” linguistic databases should have in order to tackle some of these issues; finally, we discuss a set of possible interactive visualization approaches of large dataset...
متن کاملInvited Talk: Processes that Shape Conversation and their Implications for Computational Linguistics
Experimental studies of interactive language use have shed light on the cognitive and interpersonal processes that shape conversation; corpora are the emergent products of these processes. I will survey studies that focus on under-modelled aspects of interactive language use, including the processing of spontaneous speech and disfluencies; metalinguistic displays such as hedges; interactive pro...
متن کاملA Critical Review of Information Visualizations for Natural Language
Information visualization is characterized by a collection of design principles and evaluation frameworks, rooted in cognitive science research, human-computer interaction studies, and experiences from the fine arts community. Information visualization has been coupled with computational linguistics algorithms to present visualizations of many types of linguistic data. Evaluation is often not c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008